Machine learning for prediction of postoperative nausea and vomiting in patients with intravenous patient-controlled analgesia

Background Postoperative nausea and vomiting (PONV) is a still highly relevant problem and is known to be a distressing side effect in patients. The aim of this study was to develop a machine learning model to predict PONV up to 24 h with fentanyl-based intravenous patient-controlled analgesia (IV-PCA). Methods From July 2019 and July 2020, data from 2,149 patients who received fentanyl-based IV-PCA for analgesia after non-cardiac surgery under general anesthesia were applied to develop predictive models. The rates of PONV at 1 day after surgery were measured according to patient characteristics as well as anesthetic, surgical, or PCA-related factors. All statistical analyses and computations were performed using the R software. Results A total of 2,149 patients were enrolled in this study, 337 of whom (15.7%) experienced PONV. After applying the machine-learning algorithm and Apfel model to the test dataset to predict PONV, we found that the area under the receiver operating characteristic curve using logistic regression was 0.576 (95% confidence interval [CI], 0.520–0.633), k-nearest neighbor was 0.597 (95% CI, 0.537–0.656), decision tree was 0.561 (95% CI, 0.498–0.625), random forest was 0.610 (95% CI, 0.552–0.668), gradient boosting machine was 0.580 (95% CI, 0.520–0.639), support vector machine was 0.649 (95% CI, 0.592–0.707), artificial neural network was 0.686 (95% CI, 0.630–0.742), and Apfel model was 0.643 (95% CI, 0.596–0.690). Conclusions We developed and validated machine learning models for predicting PONV in the first 24 h. The machine learning model showed better performance than the Apfel model in predicting PONV.

We developed and validated machine learning models for predicting PONV in the first 24 h. The machine learning model showed better performance than the Apfel model in predicting PONV.

Background
Postoperative nausea and vomiting (PONV) is a common condition and is known to be a distressing side effect in patients [1]. The incidence of PONV is 30% and can be as high as 80% in high-risk patients [2,3]. Although the mechanism of PONV is not clear, the use of perioperative opioids is known to be associated with it [4]. Nonetheless, opioid-based intravenous patient-controlled analgesia (IV-PCA) currently plays an important role in routine postoperative analgesic therapy [5][6][7]. Therefore, by accurately predicting PONV, patients can be warned of the risk of developing PONV, and clinicians can be assisted in making decisions about preventive treatment.
Apfel's risk score is a simple assessment tool derived to predict the 24-h rates of PONV [8,9]. However, the Apfel model does not guarantee the accurate prediction of the risk of PONV, with limited discrimination and calibration properties [10,11]. Recently, studies have used dynamic predictive models or machine learning to improve the predictive performance of PONV [12][13][14][15].
Machine learning is the application of artificial intelligence, whereby a computer algorithm automatically learns and improves from prior experience [16]. The machine learning algorithm produces an inferred function that can be used as the predictor of new data after sufficient training with known input and output values [17]. It may be used for prediction in the medical field. Recently, machine learning algorithms have shown high performance in various fields of medicine, such as diagnosis, prognosis, and clinical decision support [18][19][20][21].
To the best of our knowledge, no previous study has compared the performance of Apfel and machine learning methods in predicting PONV 24 h after surgery. We expect our research results to improve the prediction of PONV and quality of patient care.

Study population
We collected data from patients (>19 years) after non-cardiac surgery under general anesthesia who received fentanyl-based IV-PCA at Kangbuk Samsung Hospital between July 2019 and July 2020. The exclusion criteria for this study were refusal to receive PCA and admission to the intensive care unit. This study was reviewed and approved by the Institutional Review Board (IRB No. 2020-08-001) of Kangbuk Samsung Hospital (Seoul, Korea). This study was conducted in accordance with the principles of the Declaration of Helsinki of the World Medical Association. The need for written informed consent was waived as this was a retrospective study of electronic medical records.

Data collection
The rates of PONV at 1 day after surgery were measured with information on postoperative pain scores and other complications by the PCA team in our hospital. We also included patient characteristics as well as anesthetic, surgical, or PCA-related factors in the predictive models.
The continuous variables were age, body mass index (BMI), duration of anesthesia, and dosage of fentanyl in IV-PCA. The categorical variables were sex, history of motion sickness or PONV, American Society of Anesthesiologists (ASA) physical status, diabetes mellitus, hypertension, premedication, use of preintubation opioids, anesthetic agents (sevoflurane, desflurane, or TIVA), intraoperative remifentanil infusion, the use of intraoperative opioids (fentanyl or meperidine), emergency operation, laparoscopic surgery, type of surgery, adjuvant nefopam, and antiemetic (ramosetron) in IV-PCA. Continuous variables were transformed to values between 0 and 1 by minimum-maximum normalization, implemented in the caret package in the R software.

Feature selection
Feature selection is the process of selecting features that contribute the most to our prediction variable, leading to improved performance. In this process, recursive feature elimination was used as a method that fits the random forest function in the core of the model and removes the weakest feature until the specified number of features is reached. Features are ranked by the model's feature importance by iteratively eliminating a small number of features per loop. To enable the machine learning algorithms to run efficiently, we only used the data features resulting from recursive feature elimination to train our machine learning models.

Model assessment
To determine the goodness of the prediction ability, model performance was evaluated by comparing machine learning approaches to the Apfel model in terms of the area under the receiver operating characteristic curve (AUROC). The AUROC was plotted using the test dataset to understand the tradeoff in performance for different threshold values in imbalanced classification problems. We also compared the accuracy, sensitivity, and specificity.
The confusion matrix is used for summarizing the performance of a classification problem as shown in Table 1. Accuracy, sensitivity, and specificity are described in terms of true positive (TP), true negative (TN), false negative (FN) and false positive (FP). The accuracy of model is the ratio of correct predictions to total predictions made and is defined as Accuracy = (TN + TP) / (TN+TP+FN+FP) The sensitivity of model is the proportion of actual positive cases that are correctly identified and is defined as Sensitivity = TP / (TP + FN) The specificity of model is the proportion of actual negative cases that are correctly identified and is defined as Specificity = TN / (TN + FP)

Statistical analysis
All statistical analyses and computations were performed using the R software version 3.6.3 (R Development Core Team, Vienna, Austria). The machine learning algorithm was implemented using the following packages: Caret (https://CRAN.R-project.org/package=caret), Xgboost (https://CRAN.R-project.org/package=xgboost), and Keras (https://CRAN.R-project. org/package=keras). The entire code of our study (https://github.com/jgshim/PONV) is provided. Before applying the machine learning models, our data set was randomly divided into 70/ 30 training and test sets, as we did not want our models to overfit and generalize well. Specifically, 70% of the data was used for training prediction models, and 30% was used as the testing set for verification. A 10-fold cross-validation repeated three times was used to assess how the predictive model generalizes to an independent dataset. The missing data were imputed using the nearest neighbor imputation algorithms, where each missing value is replaced by a value obtained from related cases in the entire data set [22]. The synthetic minority oversampling technique method, addressing imbalanced classification problems, was used to oversample the minority class and balance the low incidence of PONV in the training set [23].

Patient's characteristics
The sample group included 2,680 patients who received fentanyl-based IV-PCA for analgesia after non-cardiac surgery under general anesthesia at Kangbuk Samsung Hospital between July 2019 and July 2020. A total of 23 patients aged �18 years were excluded. In addition, 508 patients were excluded because they were subjected to regional anesthesia. As a result, a total of 2,149 patients satisfying all inclusion criteria were enrolled in the study. During the 24 h after surgery, 337 patients (15.7%) experienced PONV. The patient characteristics as well as anesthetic, surgical, or PCA-related variables are summarized in Table 2. The correlation analysis showed a weak positive correlation between motion sickness, laparoscopy, desflurane, and gynecology surgery and PONV, as shown in Fig 1. However, male sex and smoking status showed a weak negative correlation with PONV.

Feature selection
We identified 21 variables, including patient characteristics as well as anesthetic, surgical, or PCA-related factors, from previous studies conducted to identify features that may contribute to PONV. Among these variables, anesthetics and type of surgery were categorical variables with more than two levels. As an input for our models, categorical variables with n levels were transformed into n variables, each with two levels. As a result, 34 variables were initially considered as input variables for the model.
The recursive feature elimination algorithm resulted in the final 13 factors contributing to PONV. Fig 2 shows the process of feature selection after the step of recursive feature elimination. On final feature selection, only 13 features were used as input variables in training the machine learning models for predicting PONV.

Model performance
The predictive performance of various machine learning and Apfel models is shown in Table 3. After applying the test dataset for all machine learning techniques and the Apfel score to predict PONV, we found that the AUROC using logistic regression was 0.576 (95% confidence interval [CI], 0.520-0.633), k-nearest neighbor was 0.597 (95% CI, 0.537-0.656),  The entire code used in this study is available online without restrictions (https://github. com/jgshim/IV-PCA). The detailed hyperparameters of the machine learning model used in this study can be found in the S1 Table.

Discussion
We analyzed and compared the predictive ability of seven machine learning approaches and the Apfel model to predict PONV during 24 h after surgery. The results showed that the ANN method had the largest AUROC for identifying PONV using clinical data. The key findings were as follows: (1) machine learning models such as ANN and SVM showed better performance than the Apfel model and (2) feature selection using recursive feature elimination improved human insight into complex and non-linear models associated with PONV. To our knowledge, this is the first study to predict the occurrence of PONV by comparing various classification machine learning approaches with the Apfel model.
Conventional machine learning approaches generally work efficiently with traditional datasets and allow for nonlinear relationships between predictors but may deteriorate with highdimensional problems [24]. We explored the number of selected features using the wrapped algorithm used in the recursive feature elimination procedure. By means of dimensionality reduction, the dependencies and collinearity that may exist in the model can be eliminated to improve performance.
Although volatile anesthetics was an important factor of PONV in a previous study [25], in this study, the type of volatile anesthetics or intravenous anesthesia was not helpful in predicting PONV. An increase in the duration of anesthesia was associated with a reduction in PONV, which is inconsistent with previous results [26]. The increase in PONV with the use of preintubation opioids or laparoscopic surgery is evidenced in previous studies [27,28]. Hypotension occurring during shoulder surgery may be a major factor of PONV and was used as one of the features in our study [12,29]. We believe that anesthesiologists in the operating room can help manage PONV. For example, considering the possibility of PONV, the ANN or SVM model could be useful in deciding whether to take preemptive measures, such as preparing an antiemetic in advance or continuing follow-up and observation. Furthermore, cost-effective management will be possible because the models require only 13 clinical variables to identify patients at a high risk of PONV.
The patients with chemotherapy history are at high risk for opioid induced PONV [30]. However, the number of cases of postchemotherapy patients were very small. Even if they had a history of cancer, it was not clear whether they had received chemotherapy. Thus, we were not able to include the postchemotherapy patient group as the input variable. Including chemotherapy as an input factor in further studies may improve the performance of the PONV prediction model.
It is clear that PONV is a distressing side effect in patients. A suitable screening test for PONV should include adequate sensitivity and specificity, and be acceptable to both patient and medical practitioners. Having high sensitivity but low specificity may lead to inappropriate preemptive measures. For instance, a patient with a low risk of PONV might be given antiemetics. Therefore, careful attention should be paid when used as a screening tool. There are some limitations to our study. First, we are not sure that the amount of data we used was enough to work on machine-learning problems, considering the complexity of the problem and nonlinear algorithms. Further data about PONV should be collected to improve the predictive power. Second, one of the PCA teams in the anesthesiology department visited only once during the day after surgery and asked about the effects and complications of PCA. As a result, because of recall bias, the PONV occurrence rate may have been underestimated. Third, because our study analyzed data from a single center, it might not be possible to apply our model to a wider population. Further studies are needed, with large heterogeneous samples, to improve generalizability.

Conclusions
In summary, we developed and compared various machine learning models and the Apfel model to predict the occurrence of PONV using IV-PCA. We expect our results to help reduce PONV by helping clinicians predict it and take preemptive actions.
Supporting information S1 Table. Optimal hyperparameters of all machine learning models. (DOCX)